AITopics | chinese word

The preservation and revitalization of endangered and extinct languages is a meaningful endeavor, conserving cultural heritage while enriching fields like linguistics and anthropology. However, these languages are typically low-resource, making their reconstruction labor-intensive and costly. This challenge is exemplified by Nushu, a rare script historically used by Yao women in China for self-expression within a patriarchal society. To address this challenge, we introduce NushuRescue, an AI-driven framework designed to train large language models (LLMs) on endangered languages with minimal data. NushuRescue automates evaluation and expands target corpora to accelerate linguistic revitalization. As a foundational component, we developed NCGold, a 500-sentence Nushu-Chinese parallel corpus, the first publicly available dataset of its kind. Leveraging GPT-4-Turbo, with no prior exposure to Nushu and only 35 short examples from NCGold, NushuRescue achieved 48.69% translation accuracy on 50 withheld sentences and generated NCSilver, a set of 98 newly translated modern Chinese sentences of varying lengths. A sample of both NCGold and NCSilver is included in the Supplementary Materials. Additionally, we developed FastText-based and Seq2Seq models to further support research on Nushu. NushuRescue provides a versatile and scalable tool for the revitalization of endangered languages, minimizing the need for extensive human input.

large language model, machine learning, translation, (22 more...)

arXiv.org Artificial Intelligence

2412.00218

Country: Asia > China (0.49)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Deep Learning, NLP, and Representations - colah's blog

#artificialintelligenceApr-8-2018, 02:57:41 GMT

In the last few years, deep neural networks have dominated pattern recognition. They blew the previous state of the art out of the water for many computer vision tasks. Voice recognition is also moving that way. But despite the results, we have to wonder… why do they work so well? In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it's a very elegant perspective. A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem.

neural network, representation, vector, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information

Cao, Shaosheng (Ant Financial Services Group; Singapore University of Technology and Design) | Lu, Wei (Singapore University of Technology and Design) | Zhou, Jun (Ant Financial Services Group) | Li, Xiaolong (Ant Financial Services Group)

AAAI ConferencesFeb-8-2018

We propose cw2vec, a novel method for learning Chinese word embeddings. It is based on our observation that exploiting stroke-level information is crucial for improving the learning of Chinese word embeddings. Specifically, we design a minimalist approach to exploit such features, by using stroke n-grams, which capture semantic and morphological level information of Chinese words. Through qualitative analysis, we demonstrate that our model is able to extract semantic information that cannot be captured by existing methods. Empirical results on the word similarity, word analogy, text classification and named entity recognition tasks show that the proposed approach consistently outperforms state-of-the-art approaches such as word-based word2vec and GloVe, character-based CWE, component-based JWE and pixel-based GWE.

information, machine learning, natural language, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Learning, NLP, and Representations - colah's blog

#artificialintelligenceSep-1-2017, 22:35:11 GMT

In the last few years, deep neural networks have dominated pattern recognition. They blew the previous state of the art out of the water for many computer vision tasks. Voice recognition is also moving that way. But despite the results, we have to wonder… why do they work so well? In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it's a very elegant perspective. A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem.

artificial intelligence, machine learning, neural network, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning, NLP, and Representations - colah's blog

#artificialintelligenceApr-24-2016, 13:45:47 GMT

In the last few years, deep neural networks have dominated pattern recognition. They blew the previous state of the art out of the water for many computer vision tasks. Voice recognition is also moving that way. But despite the results, we have to wonder… why do they work so well? In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it's a very elegant perspective. A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem.

artificial intelligence, machine learning, neural network, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method

Zheng, Yabin (Tsinghua University) | Li, Chen (University of California, Irvine) | Sun, Maosong (Tsinghua University)

AAAI ConferencesJul-19-2011

Chinese Pinyin input methods are very important for Chinese language processing. In many cases, users may make typing errors. For example, a user wants to type in "shenme" (什么, meaning "what" in English) but may type in "shenem" instead. Existing Pinyin input methods fail in converting such a Pinyin sequence with errors to the right Chinese words. To solve this problem, we developed an efficient error-tolerant Pinyin input method called "CHIME'' that can handle typing errors. By incorporating state-of-the-art techniques and language-specific features, the method achieves a better performance than state-of-the-art input methods. It can efficiently find relevant words in milliseconds for an input Pinyin sequence.

chinese word, mistyped pinyin, pinyin, (15 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Country: